This document provides an introduction of the R/Biocondcutor ELMER package, which is designed to combine DNA methylation and gene expression data from human tissues to infer multi-level cis-regulatory networks. ELMER uses DNA methylation to identify enhancers, and correlates enhancer state with expression of nearby genes to identify one or more transcriptional targets. Transcription factor (TF) binding site analysis of enhancers is coupled with expression analysis of all TFs to infer upstream regulators. This package can be easily applied to TCGA public available cancer data sets and custom DNA methylation and gene expression data sets.
ELMER analyses have 5 main steps:
The package workflow is showed in the figure below:
ELMER workflow: ELMER receives as input a DNA methylation object, a gene expression object (both can be either a matrix or a SummarizedExperiment object) and a Genomic Ranges (GRanges) object with distal probes to be used as filter which can be retrieved using the get.feature.probe function. The function createMAE will create a Multi Assay Experiment object keeping only samples that have both DNA methylation and gene expression data. Genes will be mapped to genomic position and annotated using ENSEMBL database, while for probes it will add annotation from (http://zwdzwd.github.io/InfiniumAnnotation). This MAE object will be used as input to the next analysis functions. First, it identifies differentially methylated probes followed by the identification of their nearest genes (10 upstream and 10 downstream) through the get.diff.meth and GetNearGenes functions respectively. For each probe it will verify if any of the nearby genes were affected by its change in the DNA methylation level and a list of gene and probes pairs will be outputted from get.pair function. For the probes in those pairs, it will search for enriched regulatory Transcription Factors motifs with the get.enriched.motif function. Finally, the enriched motifs will be correlate with the level of the transcription factor through the get.TFs function. In the figure green Boxes represents user input data, blue boxes represents output object, orange boxes represents auxiliary pre-computed data and gray boxes are functions.
To install this package, start R and enter:
devtools::install_github(repo = "tiagochst/ELMER.data")
devtools::install_github(repo = "tiagochst/ELMER")Then, to load ELMER enter:
If you used ELMER package or its results, please cite:
If you get TCGA data using getTCGA function, please cite TCGAbiolinks package:
The following steps can be used to download the example data set for ELMER.
TCGA.pipe is a function for easily downloading TCGA data and performing all the analyses in ELMER. For illustration purpose, we skip the downloading step. The user can use the getTCGA function to download TCGA data or use TCGA.pipe by including “download” in the analysis option.
The following command will do distal DNA methylation analysis and predict putative target genes, motif analysis and identify regulatory transcription factors.
TCGA.pipe("LUSC",
wd = "./ELMER.example",
cores = parallel::detectCores()/2,
mode = "unsupervised"
permu.size = 300,
Pe = 0.01,
analysis = c("distal.probes","diffMeth","pair","motif","TF.search"),
diff.dir = "hypo",
rm.chr = paste0("chr",c("X","Y")))In this new version we added the argument mode in the TCGA.pipe function. This will automatically set the minSubgroupFrac to the following values:
Modes available:
unsupervised:minSubgroupFrac = 0.2 in get.diff.meth)minSubgroupFrac = 0.4 in get.pairs and get.TFs functions)supervised:The unsupervised mode should be used when want to be able to detect a specific (possibly unknown) molecular subtype among tumor; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power. If you are using pre-defined group labels, such as treated replicates vs. untreated replicated, use supervised mode (use all samples)
For more informtaion continue reading the vignette.
A Multi Assay Experiment object is the input for all main functions of ELMER and can be generated by createMAE function.
To perform ELMER analyses, the Multi Assay Experiment needs:
If TCGA data are used, the the last two matrices will be automatically generated. Based on the genome of reference selected, metadata for the DNA methylation probes, such as genomic coordinates, will be added from Wanding Zhou annotation(Zhou, Laird, and Shen 2016); and metadata for gene annotation will be added from ensemble database (Yates et al. 2015) using biomaRt (Durinck et al. 2009).
DNA methylation data feeding to ELMER should be a matrix of DNA methylation beta (\(\beta\)) value for samples (column) and probes (row) processed from row HM450K array data. If TCGA data is used, processed data from GDC website will be downloaded
and automatically transformed to the matrix by ELMER. The processed TCGA DNA methylation data were calculated as (M/(M+U)), where M represents the methylated allele intensity and U the unmethylated allele intensity. Beta values range from 0 to 1, reflecting the fraction of methylated alleles at each CpG in the each tumor; beta values close to 0 indicates low levels of DNA methylation and beta values close to 1 indicates high levels of DNA methylation.
If user have raw HM450K data, these data can be processed by Methylumi or minfi generating DNA methylation beta (\(\beta\)) value for each CpG site and multiple samples. The getBeta function in minfi can be used to generate a matrix of DNA methylation beta (\(\beta\)) value to feed in ELMER. And we recommend to save this matrix as meth.rda since createMAE can read in files by specifying their path which will help to reduce memory usage.
# Example of DNA methylation data input
datatable(Meth[1:10, 1:10],
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = TRUE)Gene expresion data feeding to ELMER should be a matrix of gene expression values for samples (column) and genes (row). Gene expression value can be generated from different platforms: array or RNA-seq. The row data should be processed by other software to get gene or transcript level gene expression calls such as mapping by tophat, calling expression value by cufflink, RSEM or GenomeStudio for expression array. It is recommended to normalize expression data making gene expression comparable across samples such as quantile normalization. User can refer TCGA RNA-seq analysis pipeline to do generate comparable gene expression data. Then transform the gene expression values from each sample to the matrix for feeding into ELMER. If users want to use TCGA data, ELMER has functions to download the RNA-Seq Quantification data(HTSeq-FPKM-UQ) from GDC website and transform the data to the matrix for feeding into ELMER. It is recommended to save this matrix as RNA.rda since createMAE can read in files by specifying the path of files which will help to reduce memory usage.
# Example of Gene expression data input
datatable(GeneExp[1:10, 1:2],
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = TRUE)Sample information should be stored as a data.frame object containing sample ID, group labels (control and experiment). Sample ID and groups labels are required. Other information for each sample can be added to this data.frame object. When TCGA data were used, samples information will be automatically generated by createMAE function by specifying option TCGA=TRUE. A columns name TN will create the groups Tumor and Normal using the following samples to each group:
Tumor samples are:
Normal samples:
library(MultiAssayExperiment)
data <- createMAE(exp = GeneExp,
met = Meth,
met.platform = "450K",
genome = "hg19",
save = FALSE,
TCGA = TRUE)
dataA MultiAssayExperiment object of 2 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 2:
[1] DNA methylation: RangedSummarizedExperiment with 1707 rows and 234 columns
[2] Gene expression: RangedSummarizedExperiment with 3808 rows and 234 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert ExperimentList into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
as.data.frame(colData(data)[,c("patient","definition","TN")])# Adding sample information for non TCGA samples
# You should have two objects with one for DNA methylation and
# one for gene expression. They should have the same number of samples and the names of the
# sample in the gene expression object and in hte DNA methylation matrix
# should be the same
not.tcga.exp <- GeneExp # 234 samples
colnames(not.tcga.exp) <- substr(colnames(not.tcga.exp),1,15)
not.tcga.met <- Meth # 268 samples
colnames(not.tcga.met) <- substr(colnames(not.tcga.met),1,15)
# Number of samples in both objects (234)
table(colnames(not.tcga.met) %in% colnames(not.tcga.exp))
FALSE TRUE
34 234
# Our sample information must have as row names the samples information
phenotype.data <- data.frame(row.names = colnames(not.tcga.exp),
samples = colnames(not.tcga.exp),
group = c(rep("group1", ncol(GeneExp)/2),
rep("group2", ncol(GeneExp)/2)))
data.hg19 <- createMAE(exp = not.tcga.exp,
met = not.tcga.met,
TCGA = FALSE,
met.platform = "450K",
genome = "hg19",
colData = phenotype.data)
data.hg19A MultiAssayExperiment object of 2 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 2:
[1] DNA methylation: RangedSummarizedExperiment with 1707 rows and 234 columns
[2] Gene expression: RangedSummarizedExperiment with 3808 rows and 234 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert ExperimentList into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
# The samples that does not have data for both DNA methylation and Gene exprssion will be removed even for the phenotype data
phenotype.data <- data.frame(row.names = colnames(not.tcga.met),
samples = colnames(not.tcga.met),
group = c(rep("group1", ncol(Meth)/4),
rep("group2", ncol(Meth)/4),
rep("group3", ncol(Meth)/4),
rep("group4", ncol(Meth)/4)))
data.hg38 <- createMAE(exp = not.tcga.exp,
met = not.tcga.met,
TCGA = FALSE,
save = FALSE,
met.platform = "450K",
genome = "hg38",
colData = phenotype.data)
data.hg38A MultiAssayExperiment object of 2 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 2:
[1] DNA methylation: RangedSummarizedExperiment with 1702 rows and 234 columns
[2] Gene expression: RangedSummarizedExperiment with 3742 rows and 234 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert ExperimentList into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
as.data.frame(colData(data.hg38)[1:20,])Probe information is stored as a GRanges object containing the coordinates of each probe on the DNA methylation array and names of each probe. The default probe information is fetching from Wanding Zhou annotation(Zhou, Laird, and Shen 2016)
library(SummarizedExperiment, quietly = TRUE)
rowRanges(getMet(data))[1:3,1:8]GRanges object with 3 ranges and 8 metadata columns:
seqnames ranges strand | addressA addressB
<Rle> <IRanges> <Rle> | <numeric> <numeric>
cg00045114 chr1 [172674159, 172674160] * | 52756415 <NA>
cg00050294 chr1 [ 2886818, 2886819] * | 51670436 <NA>
cg00066722 chr1 [ 43634520, 43634521] * | 67752427 <NA>
channel designType nextBase nextBaseRef probeType
<character> <factor> <character> <character> <character>
cg00045114 Both II G/A C cg
cg00050294 Both II G/A C cg
cg00066722 Both II G/A C cg
orientation
<factor>
cg00045114 down
cg00050294 down
cg00066722 up
-------
seqinfo: 25 sequences from an unspecified genome; no seqlengths
Gene information is stored as a GRanges object containing coordinates of each gene, gene id, gene symbol and gene isoform id. The default gene information is the ensembl gene annotation fetched from biomaRt by ELMER function.
rowRanges(getExp(data))GRanges object with 3808 ranges and 2 metadata columns:
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
ENSG00000188984 chr1 [ 12776118, 12788726] + |
ENSG00000204518 chr1 [ 12704566, 12727097] + |
ENSG00000108270 chr17 [ 35306175, 35414171] + |
ENSG00000198691 chr1 [ 94458393, 94586688] - |
ENSG00000135776 chr1 [229652329, 229694442] - |
... ... ... ... .
ENSG00000132003 chr19 [13906274, 13943044] + |
ENSG00000162415 chr1 [45482071, 45771881] - |
ENSG00000203995 chr1 [53308183, 53360670] + |
ENSG00000162378 chr1 [53192126, 53293014] + |
ENSG00000036549 chr1 [78028101, 78149104] - |
external_gene_name ensembl_gene_id
<character> <character>
ENSG00000188984 AADACL3 ENSG00000188984
ENSG00000204518 AADACL4 ENSG00000204518
ENSG00000108270 AATF ENSG00000108270
ENSG00000198691 ABCA4 ENSG00000198691
ENSG00000135776 ABCB10 ENSG00000135776
... ... ...
ENSG00000132003 ZSWIM4 ENSG00000132003
ENSG00000162415 ZSWIM5 ENSG00000162415
ENSG00000203995 ZYG11A ENSG00000203995
ENSG00000162378 ZYG11B ENSG00000162378
ENSG00000036549 ZZZ3 ENSG00000036549
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
A Multi Assay Experiment object from the MultiAssayExperiment package is the input for multiple main functions of ELMER. It contains the above components and making a Multi Assay Experiment object by createMAE function will keep each component consistent with each other. For example, althougth DNA methylation and gene expression matrixes have different rows (probe for DNA methylation and gene id for gene expression), the column (samples) order should be same in the two matrixes. The createMAE function will keep them consistent when it generates the Multi Assay Experiment object.
data <- createMAE(exp = GeneExp,
met = Meth,
genome = "hg19",
save = FALSE,
met.platform = "450K",
TCGA = TRUE)
# For TGCA data 1-12 represents the patient and 1-15 represents the sample ID (i.e. primary solid tumor samples )
all(substr(colnames(getExp(data)),1,15) == substr(colnames(getMet(data)),1,15))[1] TRUE
# See sample information for data
as.data.frame(colData(data))# See sample names for each experiment
as.data.frame(sampleMap(data))You can also use your own data and annotations to create Multi Assay Experiment object.
# NON TCGA example: matrices has diffetrent column names
gene.exp <- DataFrame(sample1 = c("ENSG00000141510"=2.3,"ENSG00000171862"=5.4),
sample2 = c("ENSG00000141510"=1.6,"ENSG00000171862"=2.3))
dna.met <- DataFrame(sample1 = c("cg14324200"=0.5,"cg23867494"=0.1),
sample2 = c("cg14324200"=0.3,"cg23867494"=0.9))
sample.info <- DataFrame(sample.type = c("Normal", "Tumor"))
rownames(sample.info) <- colnames(gene.exp)
mae <- createMAE(exp = gene.exp,
met = dna.met,
save = FALSE,
met.platform = "450K",
colData = sample.info,
genome = "hg38")
# NON TCGA example: matrices has diffetrent column names
rownames(sample.info) <- c("sample1","sample2")
sampleMap <- DataFrame(primary = c("sample1","sample1","sample2","sample2"),
colname = c("sample1.exp","sample1.met","sample2.exp","sample2.met"))
mae <- createMAE(exp = gene.exp,
met = dna.met,
sampleMap = sampleMap,
colData = sample.info,
save = FALSE,
met.platform = "450K",
genome = "hg38") The function getTCGA uses TCGAbiolinks package (A. Colaprico et al. 2015) to download TCGA data for all samples for a given disease (such as BLCA, LGG, GBM). Its main arguments are the genome that if set to “hg19” it will download data from GDC legacy archive and if set to “hg38” it will download data from the GDC harmonized data portal.
getTCGA(disease = "LUSC", # TCGA disease abbreviation (BRCA,BLCA,GBM, LGG, etc)
basedir = "DATA", # Where data will be downloaded
genome = "hg38") # Genome of refenrece "hg38" or "hg19"If the getTCGA function called before was successful it will create the following objects and folders:
--- DATA/BRCA/
|----------- BRCA_meth_hg38.rda (object with DNA methylation)
|----------- BRCA_RNA_hg38.rda (object with gene expression)
|----------- BRCA_clinic.rda (object with indexed clinical information)
|----------- Raw/ (folder: contains All raw data from GDC)The example data set is a subset of chromosome 1 data from TCGA LUSC. ELMER analysis have 5 main steps which are shown below individually. TCGA.pipe introduced above is a pipeline combining all 5 steps and producing all results and figures.
This step is to select HM450K/EPIC probes, which locate far from TSS (at least 2Kb away) These probes are called distal probes.
Be default, this comprehensive list of TSS annotated by ENSEMBL database, which is programatically accessed using biomaRt to get its last version, will be used to select distal probes. But user can use their own TSS annotation or add features such as H3K27ac ChIP-seq in a certain cell line, to select probes overlapping thoses features regions.
# get distalprobes that are 2kb away from TSS on chromosome 1
Probe <- get.feature.probe(genome = "hg19", met.platform = "450K", rm.chr = paste0("chr",c(2:22,"X","Y")))
save(Probe,file = "result/probeInfo_feature_distal.rda")This step is to identify DNA methylation changes at distal enhancer probes which is carried out by function get.diff.meth.
For each distal probe, the function first rank samples from group 1 and group 2 samples by their DNA methylation beta values. To identify hypomethylated probes, the function compared the lower control quintile (20% of control samples with the lowest methylation) to the lower experiment quintile (20% of experiment samples with the lowest methylation), using an unpaired one-tailed t-test. Only the lower quintiles were used because we did not expect all cases to be from a single molecular subtype, and we sought to identify methylation changes within cases from the same molecular subtype. 20% (i.e. a quintile) was picked as a cutoff to include high enough sample numbers to yield t-test p-values that could overcome multiple hypothesis correction, yet low enough to be able to capture changes in individual molecular subtypes occurring in 20% or more of the cases. This number can be set arbitrarily as an input to the get.diff.meth function in the ELMER, and should be tuned based on sample sizes in individual studies. The one tailed t-test was used to rule out the null hypothesis: \(\mu_{experiment} \geq \mu_{control}\), where \(\mu_{experiment}\) is the mean methylation within the lowest experiment quintile and \(\mu\)control is the mean within the lowest control quintile. Raw p-values were adjusted for multiple hypothesis testing using the Benjamini-Hochberg method, and probes were selected when they had adjusted p-value less than 0.01. For additional stringency, probes were only selected if the methylation difference: \(\Delta = \mu_{experiment} - \mu_{control}\) was greater than 0.3. The same method was used to identify hypermethylated probes, except we used upper experiment quintile and upper control quintile, and chose the opposite tail in the t-test.
If save parameter of get.diff.meth is TRUE, two cvs files will be saved. If false, a data frame with the same content as the second file mentioned below will be reported.
The first file contains all statistic results for all probes which were fed into the function. Based on this file, user can change different P value or sig.dif cutoff to select the significant results without redo the analysis.
The second file contains statistic results for the probes that pass the significant criteria (P value and sig.dir).
Both files contain four columns: probe, pvalue, ExperimentMinControl, adjust.p.
The main arguments of this function are:
| Argument | Description |
|---|---|
| data | A multiAssayExperiment with DNA methylation and Gene Expression data. See createMAE function. |
| diff.dir | A character can be “hypo” or “hyper”, showing differential methylation dirction. It can be “hypo” which is only selecting hypomethylated probes; “hyper” which is only selecting hypermethylated probes |
| cores | A interger which defines the number of cores to be used in parallel process. Default is 1: no parallel process. |
| minSubgroupFrac | A number ranging from 0 to 1,specifying the fraction of extreme samples from group 1 and group 2 that are used to identify the differential DNA methylation. The default is 0.2 because we typically want to be able to detect a specific (possibly unknown) molecular subtype among tumor; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power. If you are using pre-defined group labels, such as treated replicates vs. untreated replicated, use a value of 1.0 (Supervised mode) |
| pvalue | A number specifies the significant P value (adjusted P value by BH) cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.01 |
| group.col | A column defining the groups of the sample. You can view the available columns using: colnames(MultiAssayExperiment::colData(data)). |
| group1 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
| group2 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
| test | Statistical test to be used. Options: t.test (DEFAULT), wilcox.test |
| sig.dif | A number specifies the smallest DNA methylation difference as a cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.3. |
| dir.out | A path specify the directory for outputs. Default is is current directory. |
| save | A logic. When TRUE, two getMethdiff.XX.csv files will be generated (see detail) |
library(MultiAssayExperiment)
data <- createMAE(exp = GeneExp,
met = Meth,
save = FALSE,
met.platform = "450K",
genome = "hg19",
TCGA = TRUE)
as.data.frame(colData(data)[1:5,])sig.diff <- get.diff.meth(data,
group.col = "definition",
group1 = "Primary solid Tumor",
group2 = "Solid Tissue Normal",
minSubgroupFrac = 0.2,
sig.dif = 0.3,
diff.dir = "hypo", # Search for hypomethylated probes in group 1
cores = parallel::detectCores()/2,
dir.out ="result",
pvalue = 0.01)
as.data.frame(sig.diff)# get.diff.meth automatically save output files.
# getMethdiff.hypo.probes.csv contains statistics for all the probes.
# getMethdiff.hypo.probes.significant.csv contains only the significant probes which
# is the same with sig.diff
dir(path = "result", pattern = "getMethdiff") [1] "getMethdiff.hypo.probes.csv"
[2] "getMethdiff.hypo.probes.significant.csv"
getMethdiff.hypo.probes.csv
getMethdiff.hypo.probes.significant.csv
As stated before the argument minSubgroupFrac (a number ranging from 0 to 1) controls the number of samples to be used from both groups when you are comparing the DNA methylation level at each probe. When comparing Primary Solid tumor samples versus Normal tissue samples only the lower quintiles were used because we did not expect all cases to be from a single molecular subtype, and we sought to identify methylation changes within cases from the same molecular subtype. But if you are dealing already with subtypes it is better to set minSubgroupFrac to 1.
The code example below highlight some different probes founds when using 100% or 20%.
# We will evaluate the different probes found using different percentage values.
groupCol <- "subtype_Expression.Subtype"
group1 <- "classical"
group2 <- "secretory"
# For each probe use lower 20% samples
sig.diff.20 <- get.diff.meth(data,
group.col = groupCol,
group1 = group1,
group2 = group2,
minSubgroupFrac = 0.2,
sig.dif = 0.3,
diff.dir = "hypo",
pvalue = 0.01)
# For each probe use all samples
sig.diff.100 <- get.diff.meth(data,
group.col = groupCol,
group1 = group1,
group2 = group2,
minSubgroupFrac = 1.0,
sig.dif = 0.3,
diff.dir = "hypo",
pvalue = 0.01)# Which probes were found with 20% of samples and not with 100%
probe <- sig.diff.20[which(!sig.diff.20$probe %in% sig.diff.100$probe),"probe"]
metBoxPlot(data,groupCol,group1,group2,probe = probe[1],minSubgroupFrac = 0.2,
diff.dir = "hypo", title = "probes were found with 20% of samples and not with 100%")# Which probes were found with 100% of samples and not with 20%
probe <- sig.diff.100[which(!sig.diff.100$probe %in% sig.diff.20$probe),"probe"]
metBoxPlot(data,groupCol,group1,group2,probe = probe[1], minSubgroupFrac = 0.2,
diff.dir = "hypo", title = "probes were found with 100% of samples and not with 20%")This step is to link enhancer probes with methylation changes to target genes with expression changes and report the putative target gene for selected probes. This is carried out by function get.pair.
First, the function get.pair for additional stringency and to avoid correlations due to non-cancer contamination, selects only those distal probes that had differential methylation as defined above, and where at least 5% of all samples (combining tumor and normal) had beta values > 0.3. We usually call locus unmethylated when the methylation value < 0.3 and methylated when the methylation value > 0.3. Basically, this step will make sure we have at least a percentage of beta values lesser than K and n percentage of beta values greater K. For example, if percentage is 5%, the number of samples 100 and K = 0.3, this filter will select probes that we have at least 5 (5% of 100%) samples have beta values > 0.3 and at least 5 samples have beta values < 0.3. This filter is important as true promoters and enhancers usually have a pretty low value (of course purity can screw that up).
we often see lots of PMD probes across the genome with intermediate values like 0.4.
Choosing a value of 0.3 will certainly give some false negatives, but not compared to the number of false positives we thought we might get without this filter. Please see the documentation for the function preAssociationProbeFiltering.
After, for each distal probe with differential methylation and passing this filter, the closest 10 upstream genes and the closest 10 downstream genes were tested for correlation between methylation of the probe and expression of the gene. To select these genes, the probe-gene distance was defined as the distance from the probe to a transcription start site specified by the ENSEMBL gene annotation. Thus, exactly 20 statistical tests were performed for each probe, as follows. For each probe-gene pair, the samples (all experiment samples and control samples) were divided into two groups: the M group, which consisted of the upper methylation quintile (the 20% of samples with the highest methylation at the enhancer probe), and the U group, which consisted of the lowest methylation quintile (the 20% of samples with the lowest methylation.) The 20% ile cutoff is a configurable parameter in the get.pair. Default is 20% as a balance, which would allow us to identify changes in a molecular subtype making up a minority (i.e. 20%) of cases, while also yielding enough statistical power to make strong predictions. For each candidate probe-gene pair, the Mann-Whitney U test was used to test the null hypothesis that overall gene expression in group M was greater or equal than that in group U. This non-parametric test was used in order to minimize the effects of expression outliers, which can occur across a very wide dynamic range. For each probe-gene pair tested, the raw p-value Pr was corrected for multiple hypothesis using a permutation approach as follows (implemented in the get.permu function of the ELMER package). The gene in the pair was held constant, and x random methylation probes were used to perform the same one-tailed U test, generating a set of x permutation p-values (Pp). We chose the x random probes only from among those that were “distal” (greater than 2kb from an annotated transcription start site), in order to make these null-model probes qualitatively similar to the probe being tested. An empirical p-value \(P_e\) value was calculated using the following formula (which introduces a pseudo-count of 1):
\[Pe = \frac{num(P_p \leq P_r)+ 1}{x+1}\]
This step is the most time consuming step since it requires a large amount calculations for permutation. The greater the permutation time is, the longer it will take. It is recommended to use multiple cores for this step. 10,000 permutations is recommended if high confidence results are desired but it may cost some hours.
If save parameter of get.pair function is true, two cvs files will be output. If save parameter is false, a data frame with the same contect as the second file mentioned as below will be output.
The first file contains all statistic results for all probe-gene pairs. Based on this file, user can change different P value or sig.dir cutoff to select the significant results without redo the analysis.
The second file contains statistic results for the probes that pass the significant criteria (Pe).
Both files contain four columns: probe, GeneID, Symbol, Distance, Sides, Raw.p, Pe.
The main arguments of this function are:
| Argument | Description |
|---|---|
| data | A multiAssayExperiment with DNA methylation and Gene Expression data. See createMAE function. |
| nearGenes | Can be either a list containing output of GetNearGenes function or path of rda file containing output of GetNearGenes function. |
| minSubgroupFrac | A number ranging from 0 to 1.0 specifying the percentage of samples used to create groups U (unmethylated) and M (methylated) used to link probes to genes. Default is 0.4 (lowest quintile samples will be in the U group and the highest quintile samples in the M group). |
| permu.size | A number specify the times of permuation. Default is 10000. |
| permu.dir | A path where the output of permutation will be. |
| pvalue | A number specify the raw p-value cutoff for defining signficant pairs. Default is 0.05. It will select the significant P value cutoff before calculating the empirical p-values. |
| Pe | A number specify the empirical p-value cutoff for defining signficant pairs. Default is 0.001. |
| dir.out | A path specify the directory for outputs. Default is current directory |
| diffExp | A logic. Default is FALSE. If TRUE, t test will be applied to test whether putative target gene are differentially expressed between two groups. |
| group.col | A column defining the groups of the sample. You can view the available columns using: colnames(MultiAssayExperiment::colData(data)). |
| group1 | A group from group.col. |
| group2 | A group from group.col. |
| cores | A interger which defines number of core to be used in parallel process. Default is 1: don’t use parallel process. |
| filter.probes | Should filter probes by selecting only probes that have at least a certain number of samples below and above a certain cut-off. See preAssociationProbeFiltering function. |
| filter.portion | A number specify the cut point to define binary methlation level for probe loci. Default is 0.3. When beta value is above 0.3, the probe is methylated and vice versa. For one probe, the percentage of methylated and unmethylated samples should be above filter.percentage value. Only used if filter.probes is TRUE. See preAssociationProbeFiltering function. |
| filter.percentage | Minimum percentage of samples to be considered in methylated and unmethylated for the filter.portion option. Default 5%. Only used if filter.probes is TRUE. See preAssociationProbeFiltering function. |
| label | A character labels the outputs. |
| save | Two files will be saved if save is true: getPair.XX.all.pairs.statistic.csv and getPair.XX.pairs.significant.csv (see detail). |
### identify target gene for significantly hypomethylated probes.
Sig.probes <- read.csv("result/getMethdiff.hypo.probes.significant.csv",
stringsAsFactors=FALSE)[,1]
head(Sig.probes) # significantly hypomethylated probes[1] "cg00045114" "cg00050294" "cg00093522" "cg00163018" "cg00173804"
[6] "cg00223046"
cg00045114
cg00050294
cg00093522
cg00163018
cg00173804
cg00223046
## Collect nearby 20 genes for Sig.probes
nearGenes <- GetNearGenes(data = data,
probes = Sig.probes,
numFlankingGenes = 20, # 10 upstream and 10 dowstream genes
cores = parallel::detectCores()/2)
## Identify significant probe-gene pairs
Hypo.pair <- get.pair(data = data,
group.col = "definition",
group1 = "Primary solid Tumor",
group2 = "Solid Tissue Normal",
nearGenes = nearGenes,
minSubgroupFrac = 0.4, # % of samples to use in to create groups U/M
permu.dir = "result/permu",
permu.size = 100, # Please set to 100000 to get significant results
pvalue = 0.05,
Pe = 0.01, # Please set to 0.001 to get significant results
filter.probes = TRUE, # See preAssociationProbeFiltering function
filter.percentage = 0.05,
filter.portion = 0.3,
dir.out = "result",
cores = parallel::detectCores()/2,
label = "hypo")
datatable(Hypo.pair, ## significant probe-gene pairs
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = TRUE)
# get.pair automatically save output files.
#getPair.hypo.all.pairs.statistic.csv contains statistics for all the probe-gene pairs.
#getPair.hypo.pairs.significant.csv contains only the significant probes which is
# same with Hypo.pair.
dir(path = "result", pattern = "getPair") [1] "getPair.hypo.all.pairs.statistic.csv"
[2] "getPair.hypo.pairs.significant.csv"
[3] "getPair.hypo.pairs.significant.withmotif.csv"
[4] "getPair.hypo.pairs.statistic.with.empirical.pvalue.csv"
getPair.hypo.all.pairs.statistic.csv
getPair.hypo.pairs.significant.csv
getPair.hypo.pairs.significant.withmotif.csv
getPair.hypo.pairs.statistic.with.empirical.pvalue.csv
This step is to identify enriched motif in a set of probes which is carried out by function get.enriched.motif.
The build-in data Probes.motif is generated using HOMER with a \(p-value \le 10^{–4}\) to scan a \(\pm250bp\) region around each probe using HOmo sapiens COmprehensive MOdel COllection http://hocomoco.autosome.ru/ v10 (Kulakovskiy et al. 2016) position weight matrices (PWMs). HOCOMOCO offers 601 PMWs each has a quality rating from A to D where A represents motifs with the highest confidence, and D motifs only weakly describe the pattern with a limited applications for quantitative analyses. By default only quality A and B will be used for the Motif enrichment analysis, but the Minimum quality score can be selected by the user. (Addtional information http://hocomoco.autosome.ru/help#description_quality_score
http://nar.oxfordjournals.org/content/44/D1/D116.full#sec-8).
For each probe set tested (i.e. the list of gene-linked hypomethylated probes in a given experiment group), a motif enrichment Odds Ratio and a 95% confidence interval were calculated using following formulas: \[ p= \frac{a}{a+b} \] \[ P= \frac{c}{c+d} \] \[ Odds\quad Ratio = \frac{\frac{p}{1-p}}{\frac{P}{1-P}} \] \[ SD = \sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \] \[ lower\quad boundary\quad of\quad 95\%\quad confidence\quad interval = \exp{(\ln{OR}-SD)} \]
where a is the number of probes within the selected probe set that contain one or more motif occurrences; b is the number of probes within the selected probe set that do not contain a motif occurrence; c and d are the same counts within the entire enhancer probe set. A probe set was considered significantly enriched for a particular motif if the 95% confidence interval of the Odds Ratio was greater than 1.1 (specified by option lower.OR, 1.1 is default), and the motif occurred at least 10 times (specified by option min.incidence. 10 is default) in the probe set. As described in the text, Odds Ratios were also used for ranking candidate motifs.
There will be two results if save parameter of get.enriched.motif is TRUE. When save is FALSE, only second result mentioned below will be reported.
The first one is a csv file. This file contains the Odds Ratio and 95% confidence interval for these Odds Ratios which pass the signficant cutoff (lower.OR and min.incidence). It contains 5 columns: motif, NumOfProbes, OR, lowerOR and upperOR.
The second file is a rda file listing each enriched motif and the probes containing the motif.
The main arguments of this function are:
| Argument | Description |
|---|---|
| data | A multi Assay Experiment from createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually. |
| probes.motif | A matrix contains motifs occurrence within probes regions. Probes.motif in ELMER.data will be used if probes.motif is missing (detail see Probes.motif.hg19.450K). |
| probes | A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated. |
| min.motif.quality | Minimum motif quality score to consider. Possible valules: A, B, C , D, AS (A and S), BS (A, B and S), CS (A, B , C and S), DS (all - default) Description: Each PWM has a quality rating from A to D where A represents motifs with the highest confidence, and D motifs only weakly describe the pattern with a limited applications for quantitative analyses. Special S quality marks the single-box motifs (secondary motif). Source: http://hocomoco.autosome.ru/help#description_quality_score More information: http://nar.oxfordjournals.org/content/44/D1/D116.full#sec-8 |
| background.probes | A vector lists name of probes which are considered as background for motif.enrichment calculation (see detail). |
| lower.OR | A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference). |
| min.incidence | A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default. |
| dir.out | A path. Specifies the directory for outputs. Default is current directory |
| label | A character. Labels the outputs such as “hypo”, “hyper” |
| save | If save is TURE, two files will be saved: getMotif.XX.enriched.motifs.rda and getMotif.XX.motif.enrichment.csv (see detail). |
## identify enriched motif for significantly hypomethylated probes which
##have putative target genes.
Sig.probes.paired <- read.csv("result/getPair.hypo.pairs.significant.csv",
stringsAsFactors=FALSE)[,1]
head(Sig.probes.paired) # significantly hypomethylated probes with putative target genes[1] "cg20701183" "cg19403323" "cg12213388" "cg26607897" "cg10574861"
[6] "cg26607897"
cg20701183
cg19403323
cg12213388
cg26607897
cg10574861
cg26607897
enriched.motif <- get.enriched.motif(data = data,
min.motif.quality = "B", # Default: options A, B, C, D
probes = Sig.probes.paired,
dir.out = "result",
label = "hypo",
min.incidence = 10,
lower.OR = 1.1)
names(enriched.motif) # enriched motifs [1] "CEBPA_HUMAN.H10MO.A" "EVX2_HUMAN.H10MO.A" "FOSL1_HUMAN.H10MO.A"
[4] "FOSL2_HUMAN.H10MO.A" "FOS_HUMAN.H10MO.A" "FOXJ3_HUMAN.H10MO.A"
[7] "IRF1_HUMAN.H10MO.A" "JUND_HUMAN.H10MO.A" "JUN_HUMAN.H10MO.A"
[10] "NANOG_HUMAN.H10MO.A" "NFAC2_HUMAN.H10MO.B" "NFAC3_HUMAN.H10MO.B"
[13] "P73_HUMAN.H10MO.A" "PBX1_HUMAN.H10MO.B" "PO2F1_HUMAN.H10MO.B"
[16] "SOX2_HUMAN.H10MO.B" "SRY_HUMAN.H10MO.B" "STAT1_HUMAN.H10MO.A"
[19] "STAT2_HUMAN.H10MO.B"
CEBPA_HUMAN.H10MO.A
EVX2_HUMAN.H10MO.A
FOSL1_HUMAN.H10MO.A
FOSL2_HUMAN.H10MO.A
FOS_HUMAN.H10MO.A
FOXJ3_HUMAN.H10MO.A
IRF1_HUMAN.H10MO.A
JUND_HUMAN.H10MO.A
JUN_HUMAN.H10MO.A
NANOG_HUMAN.H10MO.A
NFAC2_HUMAN.H10MO.B
NFAC3_HUMAN.H10MO.B
P73_HUMAN.H10MO.A
PBX1_HUMAN.H10MO.B
PO2F1_HUMAN.H10MO.B
SOX2_HUMAN.H10MO.B
SRY_HUMAN.H10MO.B
STAT1_HUMAN.H10MO.A
STAT2_HUMAN.H10MO.B
head(enriched.motif["P73_HUMAN.H10MO.A"]) ## probes in the given set that have TP53 motif.$P73_HUMAN.H10MO.A
[1] "cg24729839" "cg19896129" "cg13487474" "cg11157208" "cg15714328"
[6] "cg24296187" "cg25199536" "cg14420230" "cg18795022" "cg23981702"
[11] "cg20701183" "cg00329272" "cg18437839" "cg16578549" "cg15895782"
[16] "cg18857402" "cg24558378" "cg26883161" "cg01791421" "cg23972860"
[21] "cg21690505" "cg20146241" "cg23230478" "cg00935967" "cg18395632"
[26] "cg17181043" "cg21686171" "cg22684969" "cg12458003" "cg17907003"
[31] "cg24310780" "cg09814448"
cg24729839
cg19896129
cg13487474
cg11157208
cg15714328
cg24296187
cg25199536
cg14420230
cg18795022
cg23981702
cg20701183
cg00329272
cg18437839
cg16578549
cg15895782
cg18857402
cg24558378
cg26883161
cg01791421
cg23972860
cg21690505
cg20146241
cg23230478
cg00935967
cg18395632
cg17181043
cg21686171
cg22684969
cg12458003
cg17907003
cg24310780
cg09814448
# get.enriched.motif automatically save output files.
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif.
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif") [1] "getMotif.hypo.enriched.motifs.rda"
[2] "getMotif.hypo.motif.enrichment.csv"
getMotif.hypo.enriched.motifs.rda
getMotif.hypo.motif.enrichment.csv
# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf") [1] "hypo.all.quality.motif.enrichment.pdf"
[2] "hypo.quality.A-B.motif.enrichment.pdf"
hypo.all.quality.motif.enrichment.pdf
hypo.quality.A-B.motif.enrichment.pdf
This step is to identify regulatory TF whose expression associates with TF binding motif DNA methylation which is carried out by function get.TFs.
For each motif considered to be enriched within a particular probe set, function will compare the average DNA methylation at all distal enhancer probes within \(\pm250bp\)
of a motif occurrence, to the expression of human TFs. A statistical test was performed for each motif-TF pair, as follows. The samples (all control and experiment samples) were divided into two groups: the M group, which consisted of the 20% of samples with the highest average methylation at all motif-adjacent probes, and the U group, which consisted of the 20% of samples with the lowest methylation. The 20th percentile cutoff is a parameter to the get.TFs function and was set to allow for identification of molecular subtypes present in 20% of cases. For each candidate motif-TF pair, the Mann-Whitney U test was used to test the null hypothesis that overall gene expression in group M was greater or equal than that in group U. This non-parametric test was used in order to minimize the effects of expression outliers, which can occur across a very wide dynamic range. For each motif tested, this resulted in a raw p-value (\(P_r\)) for each of the human TFs. All TFs were ranked by the \(-log_{10}(P_{r})\), and those falling within the top 5% of this ranking were considered candidate upstream regulators. The best upstream TFs which are known recognized the motif was automatically extracted as putative regulatory TFs.
If save parameter of get.TFs function is true, two files will be generated. If save parameter is FALSE, only a data frame containing the same content with the first file mentioned below will be output.
The first one csv file. This file contain the regulatory TF significantly associate with average DNA methylation at particular motif sites. It contains 6 columns: motif, top.potential.TF.family, top.potential.TF.subfamily, potential.TFs.family, potential.TFs.subfamily and top_5percent.
The second file is rda file. This file contains a matrix storing the statistic results for associations between TFs (row) and average DNA methylation at motifs (column). This matrix can be use to generate TF ranking plots by function TF.rank.plot.
The main arguments of this function are:
| Argument | Description |
|---|---|
| data | A multiAssayExperiment with DNA methylation and Gene Expression data. See createMAE function. |
| enriched.motif | A list containing output of get.enriched.motif function or a path of XX.rda file containing output of get.enriched.motif function. |
| TFs | A data.frame containing TF GeneID and Symbol or a path of XX.csv file containing TF GeneID and Symbol. If missing, human.TF list will be used (human.TF data in ELMER.data). For detail information, refer the reference paper. |
| motif.relevant.TFs | A list containing motif as names and relavent TFs as contents for each list element or a path of XX.rda file containing a list as above. If missing, TFClass classification will be used. See function createMotifRelevantTfs |
| group.col | A column defining the groups of the sample. You can view the available columns using: colnames(MultiAssayExperiment::colData(data)). |
| group1 | A group from group.col. |
| group2 | A group from group.col. |
| minSubgroupFrac | A number ranging from 0 to 1 specifying the percentage of samples used to create the groups U (unmethylated) and M (methylated) used to link probes to TF expression. Default is 0.4 (lowest quintile of all samples will be in the U group and the highest quintile of all samples in the M group). |
| dir.out | A path specifies the directory for outputs of get.pair function. Default is current directory |
| label | A character labels the outputs. |
| cores | A interger which defines the number of cores to be used in parallel process. Default is 1: no parallel process. |
| save | A logic. If save is ture, two files will be saved: getTF.XX.significant.TFs.with.motif.summary.csv and getTF.hypo.TFs.with.motif.pvalue.rda (see detail). If save is false, a data frame contains the same content with the first file. |
## identify regulatory TF for the enriched motifs
TF <- get.TFs(data = data,
group.col = "definition",
group1 = "Primary solid Tumor",
group2 = "Solid Tissue Normal",
minSubgroupFrac = 1.0,
enriched.motif = enriched.motif,
dir.out = "result",
cores = 1,
label = "hypo")
# get.TFs automatically save output files.
# getTF.hypo.TFs.with.motif.pvalue.rda contains statistics for all TF with average
# DNA methylation at sites with the enriched motif.
# getTF.hypo.significant.TFs.with.motif.summary.csv contains only the significant probes.
dir(path = "result", pattern = "getTF")
# TF ranking plot based on statistics will be automatically generated.
dir(path = "result/TFrankPlot_family/", pattern = "pdf") Generate scatter plots for one probes’ nearby 20 gene expression vs DNA methylation at this probe.
Each scatter plot shows the methylation level of an example probe cg19403323 in all LUSC samples plotted against the expression of one of 20 adjacent genes.
scatter.plot(data,
byProbe = list(probe = c("cg19403323"), numFlankingGenes = 20),
category = "definition",
lm = TRUE, # Draw linear regression curve
save = FALSE) Each scatter plot shows the methylation level of an example probe cg19403323 in all LUSC samples plotted against the expression of one of 20 adjacent genes.
Generate a scatter plot for one probe-gene pair. Figure
Scatter plot shows the methylation level of an example probe cg19403323 in all LUSC samples plotted against the expression of the putative target gene SYT14.
scatter.plot(data,
byPair = list(probe = c("cg19403323"), gene = c("ENSG00000143469")),
category = "definition", save = TRUE, lm_line = TRUE) Scatter plot shows the methylation level of an example probe cg19403323 in all LUSC samples plotted against the expression of the putative target gene SYT14.
Generate scatter plot for TF expression vs average DNA methylation of the sites with certain motif.
Each scatter plot shows the average methylation level of sites with the TP53 motif in all LUSC samples plotted against the expression of the transcription factor TP53, TP63, TP73 respectively.
load("result/getMotif.hypo.enriched.motifs.rda")
scatter.plot(data,
byTF = list(TF = c("TP53","TP63","TP73"),
probe = enriched.motif[["P73_HUMAN.H10MO.A"]]),
category = "definition",
save = TRUE,
lm_line = TRUE)Each scatter plot shows the average methylation level of sites with the TP53 motif in all LUSC samples plotted against the expression of the transcription factor TP53, TP63, TP73 respectively.
Schematic plot shows a breif view of linkages between genes and probes.
Generate schematic plot for one probe with 20 nearby genes and label the gene significantly linked with the probe in red.
schematic.plot(pair = Hypo.pair,
data = data,
group.col = "definition",
byProbe = "cg25312122",
save = FALSE)The schematic plot shows probe colored in blue and the location of nearby 20 genes. The genes significantly linked to the probe were shown in red.
Generate schematic plot for one gene with the probes which the gene is significantly linked to.
schematic.plot(pair = Hypo.pair,
data = data,
group.col = "definition",
byGene = "ENSG00000143469",
save = FALSE)The schematic plot shows the gene colored in red and all blue colored probes, which are significantly linked to the expression of this gene.
Motif enrichment plot shows the enrichment levels for the selected motifs.
The plot shows the Odds Ratio (x axis) for the selected motifs with OR above 1.3 and lower boundary of OR above 1.3. The range shows the 95% confidence interval for each Odds Ratio
motif.enrichment.plot(motif.enrichment = "result/getMotif.hypo.motif.enrichment.csv",
significant = list(OR = 1.3,lowerOR = 1.3),
label = "hypo",
save = TRUE) ## different signficant cut off.The plot shows the Odds Ratio (x axis) for the selected motifs with OR above 1.3 and lower boundary of OR above 1.3. The range shows the 95% confidence interval for each Odds Ratio.
TF ranking plot shows statistic \(-log_{10}(P value)\) assessing the anti-correlation level of TFs expression level with average DNA methylation level at sites with a given motif. By default, the top 3 associated TFs and the TF family members (dots in red) that are associated with that specific motif are labeled in the plot. But there is also a option show highlight only TF sub-family members (TCClass database classification)
Shown are TF ranking plots based on the score (\(-log_{10}(P value))\) of association between TF expression and DNA methylation of the P73 motif in the LUSC cancer type. The dashed line indicates the boundary of the top 5% association score. The top 3 associated TFs and the TF family members=(dots in red) that are associated with that specific motif are labeled in the plot
load("result/getTF.hypo.TFs.with.motif.pvalue.rda")
motif <- "P73_HUMAN.H10MO.A"
TF.rank.plot(motif.pvalue = TF.meth.cor,
motif = motif,
save = FALSE) Accessing hocomoco to get last version of TFs family
$P73_HUMAN.H10MO.A
Shown are TF ranking plots based on the score (-log(P value)) of association between TF expression and DNA methylation of the P73 motif in the LUSC cancer type. The dashed line indicates the boundary of the top 5% association score. The top 3 associated TFs and the TF family members=(dots in red) that are associated with that specific motif are labeled in the plot
Shown are TF ranking plots based on the score (\(-log_{10}(P value)\)) of association between TF expression and DNA methylation of the P73 motif in the LUSC cancer type. The dashed line indicates the boundary of the top 5% association score. The top 3 associated TFs and the TF sub-family members=(dots in red) that are associated with that specific motif are labeled in the plot.
load("result/getTF.hypo.TFs.with.motif.pvalue.rda")
motif <- "P73_HUMAN.H10MO.A"
TF.rank.plot(motif.pvalue = TF.meth.cor,
motif = motif,
TF.label = createMotifRelevantTfs("subfamily")[motif],
save = FALSE)Accessing hocomoco to get last version of TFs subfamily
$P73_HUMAN.H10MO.A
Shown are TF ranking plots based on the score (-log(P value)) of association between TF expression and DNA methylation of the P73 motif in the LUSC cancer type. The dashed line indicates the boundary of the top 5% association score. The top 3 associated TFs and the TF sub-family members=(dots in red) that are associated with that specific motif are labeled in the plot
sessionInfo()R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] webshot_0.4.1 bindrcpp_0.1
[3] SummarizedExperiment_1.6.3 DelayedArray_0.2.7
[5] matrixStats_0.52.2 Biobase_2.36.2
[7] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2
[9] IRanges_2.10.2 S4Vectors_0.14.3
[11] BiocGenerics_0.22.0 MultiAssayExperiment_1.2.1
[13] DT_0.2 BiocStyle_2.4.0
[15] ELMER_2.0.0 ELMER.data_2.0.0
loaded via a namespace (and not attached):
[1] shinydashboard_0.6.0 R.utils_2.5.0
[3] htmlwidgets_0.8 RSQLite_1.1-2
[5] AnnotationDbi_1.38.1 grid_3.4.0
[7] trimcluster_0.1-2 BiocParallel_1.10.1
[9] devtools_1.13.2.9000 DESeq_1.28.0
[11] munsell_0.4.3 codetools_0.2-15
[13] withr_1.0.2 colorspace_1.3-2
[15] GOSemSim_2.2.0 BiocInstaller_1.26.0
[17] highr_0.6 knitr_1.16
[19] supraHex_1.14.0 robustbase_0.92-7
[21] DOSE_3.2.0 pathview_1.16.0
[23] labeling_0.3 KEGGgraph_1.34.0
[25] GenomeInfoDbData_0.99.0 mnormt_1.5-5
[27] hwriter_1.3.2 KMsurv_0.1-5
[29] rprojroot_1.2 downloader_0.4
[31] biovizBase_1.24.0 ggthemes_3.4.0
[33] EDASeq_2.10.0 diptest_0.75-7
[35] R6_2.2.1 doParallel_1.0.10
[37] locfit_1.5-9.1 AnnotationFilter_1.0.0
[39] flexmix_2.3-14 bitops_1.0-6
[41] reshape_0.8.6 fgsea_1.2.1
[43] assertthat_0.2.0 scales_0.4.1
[45] nnet_7.3-12 gtable_0.2.0
[47] ensembldb_2.0.3 rlang_0.1.1.9000
[49] genefilter_1.58.1 cmprsk_2.2-7
[51] GlobalOptions_0.0.12 splines_3.4.0
[53] rtracklayer_1.36.3 lazyeval_0.2.0
[55] acepack_1.4.1 dichromat_2.0-0
[57] hexbin_1.27.1 selectr_0.3-1
[59] checkmate_1.8.2 broom_0.4.2
[61] yaml_2.1.14 reshape2_1.4.2
[63] crosstalk_1.0.0 GenomicFeatures_1.28.3
[65] backports_1.1.0 httpuv_1.3.3
[67] qvalue_2.8.0 Hmisc_4.0-3
[69] clusterProfiler_3.4.3 tools_3.4.0
[71] psych_1.7.5 ggplot2_2.2.1
[73] RColorBrewer_1.1-2 Rcpp_0.12.11
[75] plyr_1.8.4 base64enc_0.1-3
[77] zlibbioc_1.22.0 purrr_0.2.2.2
[79] RCurl_1.95-4.8 ggpubr_0.1.3.999
[81] rpart_4.1-11 GetoptLong_0.1.6
[83] viridis_0.4.0 zoo_1.8-0
[85] ggrepel_0.6.5 cluster_2.0.6
[87] magrittr_1.5 data.table_1.10.4
[89] DO.db_2.9 circlize_0.4.0
[91] survminer_0.4.0.999 mvtnorm_1.0-6
[93] whisker_0.3-2 ProtGenerics_1.8.0
[95] pkgload_0.0.0.9000 aroma.light_3.6.0
[97] mime_0.5 hms_0.3
[99] evaluate_0.10 xtable_1.8-2
[101] XML_3.98-1.7 mclust_5.3
[103] gridExtra_2.2.1 shape_1.4.2
[105] testthat_1.0.2 compiler_3.4.0
[107] biomaRt_2.32.1 tibble_1.3.3
[109] crayon_1.3.2 R.oo_1.21.0
[111] htmltools_0.3.6 Formula_1.2-1
[113] tidyr_0.6.3 geneplotter_1.54.0
[115] DBI_0.6-1 matlab_1.0.2
[117] ComplexHeatmap_1.14.0 MASS_7.3-47
[119] GenomicInteractions_1.10.0 fpc_2.1-10
[121] ShortRead_1.34.0 Matrix_1.2-10
[123] readr_1.1.1 R.methodsS3_1.7.1
[125] bindr_0.1 Gviz_1.20.0
[127] igraph_1.0.1 km.ci_0.5-2
[129] rvcheck_0.0.8 GenomicAlignments_1.12.1
[131] foreign_0.8-68 plotly_4.7.0
[133] InteractionSet_1.4.0 xml2_1.1.1
[135] foreach_1.4.3 annotate_1.54.0
[137] XVector_0.16.0 rvest_0.3.2
[139] VariantAnnotation_1.22.1 stringr_1.2.0
[141] digest_0.6.12 ConsensusClusterPlus_1.40.0
[143] graph_1.54.0 Biostrings_2.44.1
[145] rmarkdown_1.5 fastmatch_1.1-0
[147] htmlTable_1.9 TCGAbiolinks_2.5.4
[149] survMisc_0.5.4 dendextend_1.5.2
[151] edgeR_3.18.1 curl_2.6
[153] kernlab_0.9-25 shiny_1.0.3.9000
[155] Rsamtools_1.28.0 modeltools_0.2-21
[157] rjson_0.2.15 nlme_3.1-131
[159] jsonlite_1.5 BSgenome_1.44.0
[161] viridisLite_0.2.0 limma_3.32.2
[163] lattice_0.20-35 KEGGREST_1.16.0
[165] httr_1.2.1 DEoptimR_1.0-8
[167] pkgbuild_0.0.0.9000 survival_2.41-3
[169] GO.db_3.4.1 interactiveDisplayBase_1.14.0
[171] glue_1.1.0 UpSetR_1.3.3
[173] png_0.1-7 prabclus_2.2-6
[175] iterators_1.0.8 Rgraphviz_2.20.0
[177] class_7.3-14 stringi_1.1.5
[179] AnnotationHub_2.8.2 latticeExtra_0.6-28
[181] memoise_1.1.0 dplyr_0.7.0
[183] ape_4.1
Colaprico, Antonio, Tiago C Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S Sabedot, et al. 2015. “TCGAbiolinks: An R/Bioconductor Package for Integrative Analysis of Tcga Data.” Nucleic Acids Research. Oxford Univ Press, gkv1507.
Durinck, Steffen, Paul T Spellman, Ewan Birney, and Wolfgang Huber. 2009. “Mapping Identifiers for the Integration of Genomic Datasets with the R/Bioconductor Package BiomaRt.” Nature Protocols 4 (8). Nature Publishing Group: 1184–91.
Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Anastasiia V Soboleva, Artem S Kasianov, Haitham Ashoor, Wail Ba-Alawi, et al. 2016. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44 (D1). Oxford Univ Press: D116–D125.
Silva, TC, A Colaprico, C Olsen, F D’Angelo, G Bontempi, M Ceccarelli, and H Noushmehr. 2016. “TCGA Workflow: Analyze Cancer Genomics and Epigenomics Data Using Bioconductor Packages [Version 2; Referees: 1 Approved, 2 Approved with Reservations].” F1000Research 5 (1542). doi:10.12688/f1000research.8923.2.
Yates, Andrew, Wasiu Akanni, M Ridwan Amode, Daniel Barrell, Konstantinos Billis, Denise Carvalho-Silva, Carla Cummins, et al. 2015. “Ensembl 2016.” Nucleic Acids Research. Oxford Univ Press, gkv1157.
Zhou, Wanding, Peter W Laird, and Hui Shen. 2016. “Comprehensive Characterization, Annotation and Innovative Use of Infinium Dna Methylation Beadchip Probes.” Nucleic Acids Research. Oxford Univ Press, gkw967.